Parallel Chinese-English Entities, Relations and Events Corpora
نویسندگان
چکیده
This paper introduces the parallel Chinese-English Entities, Relations and Events (ERE) corpora developed by Linguistic Data Consortium under the DARPA Deep Exploration and Filtering of Text (DEFT) Program. Original Chinese newswire and discussion forum documents are annotated for two versions of the ERE task. The texts are manually translated into English and then annotated for the same ERE tasks on the English translation, resulting in a rich parallel resource that has utility for performers within the DEFT program, for participants in NIST’s Knowledge Base Population evaluations, and for cross-language projection research more generally.
منابع مشابه
Contrastive connectors in English and Chinese: A case
This comparative study of however and its Chinese counterparts in two translation corpora (the HLM parallel corpus, and the Babel English-Chinese Parallel Corpus) reveals that the Chinese contrastive relations tend to be expressed implicitly (cf. Wang and Zheng 2004) and Chinese contrastive connectors are generally used in sentence initial position, whereas the English contrastive relations ten...
متن کاملAutomatic Extraction of English Collocations and their Chinese - English Bilingual Examples : A Computational Tool for Bilingual Lexicography
This paper describes the procedures involved in developing EXEC, a web-based system which can automatically extract English collocations and their Chinese-English bilingual examples from parallel corpora. The system draws on statistics, dependency parsing, and Chinese-English parallel corpora of more than 13 million English words and 27 million Chinese characters. By taking a word as well as th...
متن کاملUsing Word Embeddings to Translate Named Entities
In this paper we investigate the usefulness of neural word embeddings in the process of translating Named Entities (NEs) from a resource-rich language to a language low on resources relevant to the task at hand, introducing a novel, yet simple way of obtaining bilingual word vectors. Inspired by observations in (Mikolov et al., 2013b), which show that training their word vector model on compara...
متن کاملMulti-feature Based Chinese-English Named Entity Extraction from Comparable Corpora
Bilingual Named Entity Extraction is important to some cross language information processes such as machine translation (MT), cross-lingual information retrieval (CLIR), etc. A lot of previous work extracted bilingual Named Entities from parallel corpus. Here we propose a multifeature based method to extract bilingual Named Entities from comparable corpus. We first recognize the Chinese and Eng...
متن کاملMining Large-scale Parallel Corpora from Multilingual Patents: An English-Chinese example and its application to SMT
In this paper, we demonstrate how to mine large-scale parallel corpora with multilingual patents, which have not been thoroughly explored before. We show how a large-scale English-Chinese parallel corpus containing over 14 million sentence pairs with only 1-5% wrong can be mined from a large amount of English-Chinese bilingual patents. To our knowledge, this is the largest single parallel corpu...
متن کامل